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Abstract. This paper proposes a general framework for compressed sensing of constrained 
joint sparsity (CJS) which includes total variation minimization (TV-min) as an example. 
TV- and 2-norm error bounds, independent of the ambient dimension, are derived for the 
CJS version of Basis Pursuit and Orthogonal Matching Pursuit. As an application the 
results extend Candes, Romberg and Tao's proof of exact recovery of piecewise constant 
objects with noiseless incomplete Fourier data to the case of noisy data. 



I. Introduction 

One of the most significant developments in imaging and signal processing of the last 
decade is compressive sensing (CS) which promises reconstruction with fewer data than the 
ambient dimension. The CS capability (HI E] hinges on favorable sensing matrices and 
enforcing a key prior knowledge, i.e. sparse objects. 

Consider the linear inverse problem Y = <&X+E where X G C m is the sparse object vector 
to be recovered, Y G C n is the measurement data vector and E G C n represents the (model 
or external) errors. The great insight of CS is that the sparseness of X, as measured by the 
sparsity ||X||o = # nonzero elements in X, can be effectively enforced by Ll-minimization 
(Ll-min) [TTJ [20] 

(1) minll^Ha subject to (s.t.) \\&Z - Y\\ 2 < \\E\\ 2 

with favorable sensing matrices 

The Ll-min idea dates back to geophysics research in 1970's [T3| 1ST]. The Ll-minimizer is 
often a much better approximation to the sparse object than the traditional minimum energy 
solution via L2-minimization because 1-norm is closer to || • ||o than the 2-norm. Moreover, 
the Ll-min principle is a convex optimization problem and can be efficiently computed. The 
Ll-min principle is effective in recovering the sparse object with the number of data n much 
less than the ambient dimension m if the sensing matrix $ satisfies some favorable conditions 
such as the restricted isometry property (RIP) |5J: $ is said to satisfy RIP of order k if 

(2) (l-5 k )\\Z\\l 2 <\\&Z\\i<(l + 5 k )\\Z\\l 

for any /c-sparse vector Z where the minimum of such constant 5k is the restricted isometry 
constant (RIC) of order k. 

The drawback of RIP is that only a few special types of matrices are known to satisfy RIP, 
including independently and identically distributed (i.i.d.) random matrices and random 
partial Fourier matrices formed by random row selections of the discrete Fourier transform. 



A more practical alternative CS criterion is furnished by the incoherence property as 
measured by one minus the mutual coherence 

(3) = max -. ^M3 ^ M 

PHE2]. 

A parallel development in image denoising pioneered by Osher and coworkers [29l [30] seeks 
to enforce edge detection by total variation minimization (TV-min) 

(4) min J \Vg\ s.t. J \g - f\ 2 < e 2 

where / is the noisy image and e is the noise level. The idea is that for the class of piecewise 
constant functions, the gradient is sparse and can be effectively enforced by TV-minimization. 

For digital images, TV-min approach to deblurring can be formulated as follows. Let 
/ G C pxq be a noisy complex- valued data of p x q pixels. Let T be the transformation 
from the true object to the ideal sensors, modeling the imaging process. Replacing the total 
variation in Q by the discrete total variation 

nItv = J2 V\^jW+W^jW, 

= (Ajp, A 2 g)(i,j) = (g(i + l,j) - g(i,j),g(i,j + 1) - g(ij)) 

we obtain 

(5) min II^Htv s.t. ||T#-/|| 2 <£ 
cf. [TIE]. 

In a breakthrough paper (3], Candes et al. show the equivalence of ^ to Q for a random 
partial Fourier matrix with noiseless data (e = 0) and obtain a performance guarantee of 
exact reconstruction of piece-wise constant objects from 

A main application of this present work is to extend the result of [3] to inverse scattering 
with noisy data. In this context it is natural to work with the continuum setting in which 
the object is a vector in an infinite dimensional function space, e.g. L 2 (M> d ). To fit into the 
CS's discrete framework, we discretize the object function by pixelating the ambient space 
with a regular grid of equal spacing t. 

The grid spacing £ can be thought of as the resolution length, the fundamental parameter 
of the discrete model from which all other parameters are derived. For example, the total 
number of resolution cells is proportional to £~ d , i.e. m = 0(£~ d ). As we will assume 
that the original object is well approximated by the discrete model in the limit £ — > 0, the 
sparsity s of the edges of a piecewise constant object is proportional to £ 1 ~ d , i.e. the object 
is non-fractal. It is important to keep in mind the continuum origin of the discrete model in 
order to avoid confusion about the small £ limit throughout the paper. 

First we introduce the notation for multi-vectors Y e C nxa! 

(6) ||Y|| V = (V||row,(Y)||f) , a,b>l 



where rowj(Y) is the jth row of Y. The 2,2-norm is exactly the Frobenius norm. To 
avoid confusion with the subordinate matrix norm [23], it is more convenient to view Y as 
multi-vectors rather than a matrix. 

We aim at the following error bounds. Let V be the discretized object and V an estimate 
of V. We will propose a compressive sampling scheme that leads to the error bound for the 
TV-minimizer V 

(7) || Al/ - Ay|| 2)2 = 0(e), £^0 
implying via the discrete Poincare inequality that 

(8) \\V-Vh = 0(j) 

independent of the ambient dimension. 

If V is the reconstruction by using a version of the greedy algorithm, Orthogonal Matching 
Pursuit (OMP) P2JE7], for multi- vectors then in addition to ([7| we also have 

(9) \\ V -V h = (^=) 

independent of the ambient dimension (Section [3J. We do not know if the bound ^ applies 
to the TV-minimizer. 

A key advantage of the greedy algorithm used to prove ^ is the exact recovery of the 
gradient support (i.e. the edge location) under proper conditions (Theorem |2j Section [3]). 
On the one hand, TV-min requires fewer data for recovery: O(s) for TV-min under RIP 
versus 0(s 2 ) for the greedy algorithm under incoherence where the sparsity s = 0(£ l ~ d ) 
as already mentioned. On the other hand, the greedy algorithm is computationally more 
efficient and incoherent measurements are much easier to design and verify than RIP. 

At heart our theory is based on reformulation of TV-min as CS of joint sparsity with linear 
constraints (such as curl-free constraint in the case of TV-min): BPDN for constrained joint 
sparsity (CJS) is formulated as 

(10) min ||Z|| 1;2 , s.t. ||Y-^(Z)|| 2) 2 <e, CZ = 
where 

<f(Z) = [3>iZi, . . . , <&dZd\ , Zj = the jth column of Z 
where £ represents a linear constraint. Without loss of generality, we assume the matrices 
C C nxm all have unit 2-norm columns. 
In connection to TV-min, Zj is the j-th directional gradient of the discrete object V. 
And from the definition of discrete gradients, it is clear that every measurement of Zj can 
be deduced from two measurements of the object V, slightly shifted in the j-th direction 
with respect to each other. As shown below, for inverse scattering <!?., = <&, Vj and £ is the 
curl-free constraint which takes the form 

A X Z 2 = A 2 Z 1 

for d = 2 (cf. (53)). Our main results, Theorem [TJ and Theorem |2j constitute performance 



guarantees for CJS based, respectively, on RIP and incoherence of the measurement matrices 



" 3 ' 
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1.1. Comparison of existing theories. The gradient-based method of [26] modifies the 
original Fourier measurements to obtain Fourier measurements of the corresponding vertical 
and horizontal edge images which then are separately reconstructed by the standard CS 
algorithms. This approach attempts to take advantage of usually lower separate sparsity 
and is different from TV-min. Nevertheless, a similar 2-norm error bound (Proposition V.2, 
[26] ) to ^ is obtained. 

Needell and Ward [2S] obtain interesting results for the anisotropic total variation (ATV) 
minimization in terms of the objective function 

IMIatv = Yl \ A i9(hj)\ + \A 2 g(i,j)\- 

While for rea/-valued objects in two dimensions, the isotropic TV semi-norm is equivalent 
to the anisotropic version, the two semi-norms are, however, not the same in dimension 
> 3 and/or for complex- valued objects. A rather remarkable result of [25] is the bound 
|| V — V\\2 = 0(e), modulo a logarithmic factor, for d = 2. This is achieved by proving a 
strong Sobolev inequality for two dimensions under the additional assumption of RIP with 
respect to the bivariate Haar transform. Unfortunately, this latter assumption prevents 
the results in [25] from being directly applicable to structured measurement matrices such 
as Fourier-like matrices which typically have high mutual coherence with any compactly 
supported wavelet basis when adjacent subbands are present. Their approach also does not 
guarantee exact recovery of the gradient support. 

It is worthwhile to further consider these existing approaches from the perspective of the 
CJS framework for arbitrary d. The approach of [26] can be reformulated as solving d 
standard BPDN's 

min HZrlli, s.t. \\Y T — <frZ r || 2 < e, r—l,...,d. 

separately without the curl-free constraint £ where Z T and Y T are, respectively, the r-th 
columns of Z and Y. To recover the original image from the directional gradients, an 
additional step of consistent integration becomes an important part of the approach in |26j . 

From the CJS perspective, the ATV-min considered in [25] can be reformulated as follows. 
Let Z G C dm be the image gradient vector by stacking the d directional gradients and let 
Y G C dn be the similarly concatenated data vector. Likewise let $ = diag($i, . . . , <S>d) £ 
C dnxdm be the block-diagonal matrix with blocks $j G C nxm . Then ATV -min is equivalent 
to BPDN for a single constrained and concatenated vector 

(11) min||Z||i, s.t. ||y-*Z|| 2 <e, CZ = 0. 

where C is the same constrain £ reformulated for concatenated vectors. Repeating verbatim 
the proofs of Theorems [l] and [2] we obtain the same error bounds as Q-Q for ATV- mm as 



formulated in (11) under the same conditions for $j separately. 

ATV-min is formulated differently in |25j. Instead of image gradient, it is formulated 
in terms of the image to get rid of the curl-free constraint. To proceed the differently 
concatenated matrix [$!,..., is then assumed to satisfy RIP of higher order demanding 
2dn measurement data. For d = 2, RIP of order 5s with 5^ s < 1/3 is assumed for <fr 2 ] 
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in [23] which is much more stringent than RIP of order 2s with <5 2s < v2 — 1 for $ 1; $ 2 



separately in (11). In particular, $x = <fr 2 is allowed for (11) but not for [2H]. To get the 
favorable 0(e) 2-norm error bound for d — 2, additional measurement matrix satisfying 
RIP with respect to the bivariate Haar basis is needed, which, as mentioned above, excludes 
partial Fourier measurements. 

1.2. Organization. The rest of the paper is organized as follows. In Section [2j we present 
a performance guarantee for BPDN for CJS and obtain error bounds. In Section [3j we 
analyze the greedy approach to sparse recovery of CJS and derive error bounds, including 
an improved 2-norm error bound. In Section |4| we review the scattering problem starting 
from the continuum setting and introduce the discrete model. In Section [5j we discuss 
various sampling schemes including the forward and backward sampling schemes for inverse 
scattering for point objects. In Section[6]we formulate TV-min for piecewise constant objects 
as BPDN for CJS. We present numerical examples and conclude in Section [7} We present 
the proofs in the Appendices. 

2. BPDN FOR CJS 
Consider the linear inversion problem 

(12) Y = v?(X) + E, £X = 
where 

<f(X) = $ 2 X 2 , . . . , $ d X d ), e C nxm 
and the corresponding BPDN 

(13) min||Z|| lj2 , s.t. ||Y - <p(Z)\\ 2t2 < e = ||E|| 2i2 , £Z = 0. 

For TV-min in d dimensions, = Vj, X represents the discrete gradient of the unknown 
object V and £ is the curl-free constraint. Without loss of generality, we assume the matrices 
all have unit 2-norm columns. 

We say that X is s-row sparse if the number of nonzero rows in X is at most s. With a 
slight abuse of terminology we call X the object (of CJS). 

In the following theorems, we let the object X be general, not necessarily s-row sparse. 
Let X 1 - 5 -* consist of s largest rows in the 2-norm of X. Then X^ is the best s-row sparse 
approximation of X. 

Theorem 1. Suppose that the linear map if satisfies the RIP of order 2s 

(14) (1 - ^)||Z||| 2 < ||^(Z)||| 2 < (1 + 5 2s )||Z|| 2 2 2 
for any 2s-row sparse Z with 

5 2s < V2-1. 



Let X be the minimizer of (13). Then 
(15) l|X-X|| 2)2 < ds-^HX-XWlta + Cze 

for absolute constants Ci,C 2 depending only on <5 2s . 



Remark 1. Note that the RIP for joint sparsity (14) follows straightforwardly from the 
assumption of separate RIP 

(i-<5 2 .)M< 11*^11! <(i + * 2 .)M, vj 

with a common RIC. 

Remark 2. For the standard Lasso with a particular choice of regularization parameter, 
Theorem 1.3 of [I] guarantees exact support recovery under a favorable sparsity constraint. 
In our setting and notation, their TV-min principle corresponds to 

(16) min A<7||Z||i i2 + J||Y - ^(Z)||| 2 , A = 2^2 logm 

where a 2 = e 2 /(2n) is the variance of the assumed Gaussian noise in each entry ofY. 
Unfortunately, even if the result of jl] can be extended to (16), it is inadequate for our 
purpose because jl] assumes independently selected support and signs, which is clearly not 
satisfied by the gradient of a piecewise constant object. 

The proof of Theorem [T] is given in Appendix A. 

The error bound (15) implies ^ for s-row sparse X. For the 2-norm bound ([8]), we apply 
the discrete Poincare inequality [T2] 

"< =3-11 a/IB 



to get 



(17) l|V-V|| 2 <^C 2 e = (</ . 

3. Greedy pursuit for CJS 

One idea to improve the error bound is through exact recovery of the support. This can 
be achieved by greedy algorithms. As before, we consider the general linear inversion with 
CJS (|12| with ||E|| 2i2 = e. 

Our following algorithm is an extension of the joint-sparsity greedy algorithms of [T5"l QUI 
to the setting with multiple sensing matrices. 



Algorithm 1. OMP for joint sparsity 

Input: {&j}, Y,rj > 

Initialization: X° = 0, R° = Y and S° = 

Iteration: 

1) «max = argmaxj Ylj=i where $*j is the conjugate transpose of i-th column of <3?j 

2) S 1 1 {/ !:mx } 

3) X fc = argmin ||$Z - Y|| 2j2 s.t. supp(Z) C S k 

4) R fe = Y — <j>{X k ) 

5) Stop if EjWRjh <£■ 
Output: X k . 
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Note that the linear constraint is not enforced in Algorithm 1. 

A natural indicator of the performance of OMP is the mutual coherence ^ [19] . Let 

/i max = max/i($ i ). 

j 

Then analogous to Theorem 5.1 of [19], we have the following performance guarantee. 
Theorem 2. Suppose the sparsity s satisfies 

(18) s<^(l + — ^) — , X min = min||row fc (X)||i. 

Let Z be the output of Algorithm 1, with the stopping rule that the residual drops to the level 
e or below. Then supp(Z) = supp(X). 
Let X solve the least squares problem 

(19) X = arg min ||Y — $B|| 2i 2, s.t supp(B) C supp(X), £B = 0. 

B 

Then 

(20) ||X-X|| 2)2 < ~ 



\A ~ /Vax(-S — 1) 

The proof of Theorem [2] is given in Appendix B. 

The main advantage of Theorem [2] over Theorem [T] is the guarantee of exact recovery of 
the support of X. Moreover, a better 2-norm error bound follows because now the gradient 
error is guaranteed to vanish outside a set of cardinality 0[i x ~ d \. Let £hi C £L, I = 1, ...,L 
be the level sets of the object V such that 

L 

v = J2 v i l m 

i=i 



where L; PI L& = 0, 1 ^ k, L = U^L;. The reconstructed object V from X given in (19) also 
takes the same form 

L 

i=i 

To fix the undetermined constant, we may assume that v\ = V\. Since 

mv-v)\\ 2)2 = o{s) 



by (20) and the gradient error occurs only on the boundaries of £hi of cardinality OiJL 1 d ) 
we have 

\v l -vi\ = 0(e6 d - 1)/2 ) 1 VI. 

Namely 

\\V -V\\ 00 = 0{e£ {d - 1)/2 ) 
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and thus 

4. Application: inverse scattering 

In this section, we discuss the main application of the CJS formulation, i.e. the TV-min 
for inverse scattering problem. 

A monochromatic wave u propagating in a heterogeneous medium characterized by a 
variable refractive index n 2 (r) = 1 + v(r) is governed by the Helmholtz equation 

(21) V 2 w(r) +w 2 (l + t>(r))w(r) =0 

where v describes the medium inhomogeneities. For simplicity, the wave velocity is assumed 
to be unity and hence the wavenumber u equals the frequency. 
Consider the scattering of the incident plane wave 

(22) ^(r) = e^ r ' a 

where d is the incident direction. The scattered field u s = u — u l then satisfies 

(23) V V + w V = -co 2 vu 
which can be written as the Lippmann-Schwinger equation: 

(24) u s (r) = co 2 [ v(r') (n i (r / ) + n s (r'))G(r,r , )rfr / 

where G is the Green function for the operator —(V 2 + u 2 ). 

The scattered field necessarily satisfies Sommerfeld's radiation condition 

lim r^W^-iwV = 
r->oo V or J 

reflecting the fact that the energy which is radiated from the sources represented by the right 
hand side of ( 23 ) must scatter to infinity. 

Thus the scattered field has the far-field asymptotic 

(25) M s (r) = ^ ri ^(A(r,d,a;) + 0(|rr 1 )), r = r/|r|, 

where A is the scattering amplitude and d the spatial dimension. In inverse scattering theory, 
the scattering amplitude is the measurement data determined by the formula [H] 

A(r,d,u) = ^ J dr'v(r')u(r')e-^'' f 
which under the Born approximation becomes 

(26) A(r,d,u) = ^ J dr'v(r')e lujr '-^ 
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For the simplicity of notation we consider the two dimensional case in detail. Let L C Z 2 
be a square sublattice of m integral points. Suppose that s point scatterers are located in a 
square lattice of spacing £ 

(27) £h = {rj = £(px,p 2 ) ■ j = (pi - l)Vm + p 2 ,p = (pi,P2) e L} . 

In the context of inverse scattering, it is natural to treat the size of the discrete ambient 
domain £h being fixed independent of the resolution length £. In particular, m ~ £~ 2 in two 
dimensions. 

First let us motivate the inverse scattering sampling scheme in the case of point scatterers 
and let Vj,j = 1, ...,m be the strength of the scatterers. In other words, the total object is 
a sum of 5-functions 

(28) v(r) = J2v j 8(r-v j ). 

j 

Let S = {iv : j — 1, s} be the locations of the scatterers. Hence Vj = 0, Vr, ^ S. 
For point objects the scattering amplitude becomes a finite sum 

,2 m 



(29) A(r,d,u) = r J2 v i e ""' ! 



^ " iuirj-(d—r) 



An 

3=1 

In the Born approximation the exciting field u(rj) is replaced by the incident field u^rj). 

5. Sampling schemes 



Next we review the sampling schemes introduced in [21] for point objects (28). 
Let d;, ii, I = 1, n be various incident and sampling directions for the frequencies ui, I = 
1, ...,n to be determined later. Define the measurement vector Y = (yi) 6 C n with 

(30) yi= A(rt,dt,u)i), l = l,...,n. 

The measurement vector is related to the point object vector X = (vj) G C m by the sensing 
matrix $ as 

(31) Y = <f>X + E 

where E is the measurement error. Let 6i,6i be the polar angles of d/,r;, respectively. The 
(I, j)-entry of $ G C nxm is 

(32) n~ 1 ^ 2 e~ iuiltl ' Tj e iull ^ l ' rj = n - 1 / 2 e i ^^(P2(sme i -sine i )+pi(cose i -cose ; )) ) j — ^ _ \\ + p 2 

Note that $ has unit 2-norm columns. 

Let (£z,0) be i.i.d. uniform random variables on [—1, l] 2 and let pi,4>i be the polar coor- 
dinates as in 



(33) (6, CO =P/(cos^,sin^), pi = ^tf + C? < V2 

Let the sampling angle Q\ be related to the incident angle Q\ via 

(34) Q l + 0, = 20, + vr, 
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and set the frequency oji to be 
(35) 



topi 



sm 



where f2 is a control parameter. Then the entries (32) of the sensing matrix $ under the 
condition 



(36) ill = tt/V2 

are those of random partial Fourier matrix 



(37) 



I 



l,...,n, 



Pi,p 2 = 1, y/m. 



We consider two particular sampling schemes: The first one employs multiple frequencies 
with the sampling angle always in the back-scattering direction resembling the imaging 
geometry of synthetic aperture radar; the second employs only single high frequency with 
the sampling angle in the forward direction, resembling the imaging geometry of X-ray 
tomography. 

I. Backward Sampling This scheme employs Q— band limited probes, i.e. uji G [—12,0]. 
This and (35) lead to the constraint: 



(3i 



. e l -e l 

sm — - — 



> 



Pl_ 
V2' 



A simple way to satisfy (34) and (38) is to set 

(39) & = ~e l = e l -Ti, 

x/2 



(40) 



I = 1, ...,n. In this case the scattering amplitude is sampled exactly in the backward direc- 
tion, resembling SAR imaging. In contrast, the exact forward sampling with &i = 9i almost 
surely violates the constraint ( 38 ) . 

II. Forward Sampling This scheme employs single frequency probes no less than Q: 
(41) coi = -ffl, 7>1, / = l,...,n. 

We set 

Pi 



(42) 
(43) 



0, 
0j = 



<pi + arcsin 
ij — arcsin - 
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lV2 
Pi 

x/2' 



The difference between the incident angle and the sampling angle is 

Pi 



(44) 



9i — 9i = 2 arcsin 
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V2 



which diminishes as 7 — > 00. In other words, in the high frequency limit, the sampling angle 
approaches the incident angle, resembling X-ray tomography [21]. 

6. PlECEWISE CONSTANT OBJECTS 

Next let us consider the following class of piecewise constant objects: 

r 



1 V 2 



2' 2 



(45) v(r) = ^ p I H (-- p ), □ = 

peL 

where Iq is the indicator function of the unit square □. As remarked in the Introduction, 
we think of the pixelated v as discrete approximation of some compactly support function 
on M 2 and having a well-defined limit as £ — > 0. Set V = (vj) G C m , j = (pi — l)^/rn + p 2 . 

The discrete version of (26) is, however, not exactly the same as (29) since extended 
objects have different scattering properties from those of point objects. 

The integral on the right hand side of ([26]), modulo the discretization error, is 



J pGL J 

Now letting d;,f/,a;;,/ = 1, • • • ,n be selected according to Scheme I or II and substituting 
them in the above equation, we obtain 

= e y v efr(pi6-n*tt) 2 sin (7r ^ /2) 2 sin ( ^ /2) 
pk p n & m 

Let 

Xj = fv p , j = (pi - l)y/m + p 2 

and 

47T 

yi = r- A{ri, di,ui) + E h l = l,---,n 

where 

2sin(7r^/2)2sin(vr^/2) 

9i = ~, 

■nil irrji 

where E — (e{) is the noise vector. 

Define the sensing matrix $ = [<pkp] as 

(46) <p kp =^=j*Wk-H*r,k) p= ( pi _ 1)v ^ + p2? PljP2 = l j ..., V ^. 
'n 



Then (26) can be written in the same form as (31) 

(47) Y = <frX + E, X = ( Xj ) 

where the data and error vectors have been modified as above to account for the differences 
between extended and point objects. 
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Our goal is to establish the performance guarantee for TV-min 
(48) min ||Z||tv, subject to \\Y - &Z\\ 2 < \\E\\ 2 . 



And we accomplish this by transforming (48) into BPDN for CJS (13). 
Define X = {X h X 2 ) with 

(X ll X 2 ) = f(A 1 V,A 2 V) GC mx2 
Suppose the support of {v p+ei ,v p+e2 } is contained in L. Simple calculation yields that 



= -e I ^yve 1 e il(ftf ' +M,) 
peL 

1)2 



and thus 

(49) (e-^-l)yi = 



(50) (e-^-lJw = 



1 ^ >p+«* ~ v p )e^' +p ^) 

pSL 

2 



n 

peL 



V p +e 2 - v P )e 



Define Y = (Y U Y 2 ) with 

Yl = (( e -^« - 1)3,,) , F 2 = (( e -^ - %) e C 
and E = {E X: E 2 ) with 

(51) £7i = ((e-^ - l)e,) , £ 2 = ((e"^ - l)e,) G C\ 



We rewrite (47) in the form 

(52) Y = $X + E. 
subject to the constraint 

(53) AxX 2 = A 2 Xi 

which is the discrete version of curl-free condition. This ensures that the reconstruction by 
line integration of (v p ) from X is consistent (i.e. path-independent). 



To see that (53) is necessary and sufficient for the recovery of (v p ), consider, for example, 
the notations in Figure [T] and suppose i>o,o is known. By definition of the difference operators 
Ai, A 2 we have 

vi,o = v 0fi + (AiV) ,o 

Vo,l = v 0fi + (A 2 K)o,0 
12 
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Figure 1. Consistency among cells C,C and C" . 



In general, we can determine v p , p G L iteratively from the relationship 

v p+ei = v p + (AiV)p 
v P+e2 = v p + (A 2 V) P 

and the knowledge of V at any grid point. The path-independence in evaluating v pi+ i )P2+ i 



v 



J Pl ,P2 



(A 1 V) 



Pi :P2 



(A 2 V) 



+ (A 2 V) PUP2 + (A 1 V) 



Pl+l,P2 
Pl,P2 + l 



implies that 



(A 2 V) P1+1 , P2 - (A 2 V) 



IPX-P2 



(AiV) 



P1,P2 + 1 



(A x y) 



P1,P2 



which is equivalent to (53). 



Now eq. (47) is equivalent to (52) with the constraint (53) provided that the value of V 



at (any) one grid point is known. The equivalence between the original TV-min (48) and 
the CJS formulation (13) with $j = Vj then hinges on the equivalence of their respective 



feasible sets which can be established under the assumption of Gaussian noise. When E in 



(47) is Gaussian noise, then so is E and vice versa, with variances precisely related to each 



other. 
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Figure 2. The original 256 x 256 Shepp-Logan phantom (left), the Shepp- 
Logan phantom and the magnitudes of its gradient with sparsity s = 2184. 



0(s), up to a 
Therefore (18) 



The random partial Fourier measurement matrix satisfies RIP with n 
logarithmic factor [3], while its mutual coherence jj, behaves like 0(n~ 1 / 2 ) 
impies the sparsity constraint s = 0(y/n) for the greedy approach which is more stringent 
than s = 0(n) for the BPDN approach. 

7. Conclusion 

We have developed a general CS theory (Theorems [I] and [2]) for constrained joint sparsity 
with multiple sensing matrices and obtained performance guarantees parallel to those for 
the CS theory for single measurement vector and matrix. 

From the general theory we have derived 2-norm error bounds for the object and the gra- 
dient, independent of the ambient dimension, for TV-min and greedy estimates of piecewise 
constant objects. 

In addition, the CJS greedy algorithm can recover exactly the gradient support (i.e. the 
edges of the object) leading to an improved 2-norm error bound. Although the CJS greedy 
algorithm needs a higher number of measurement data than TV-min for Fourier measure- 
ments the incoherence property required is much easier to check and often the only practical 
way to verify RIP when the measurement matrix is not i.i.d. or Fourier. 

We end by presenting a numerical example demonstrating the noise stability of the TV- 
min. Efficient algorithms for TV-min denoising/deblurring exist [U [34]. We use the open 
source code Ll-MAGIC ( |http : / /users . ece . g atech . edu/p ustin/llmagic/) for our sim- 
ulation. 

Figure [2] shows the 256 x 256 image of the Shepp-Logan Phantom (left) and the modulus 
of its gradient (right). Clearly the sparsity (s = 2184) of the gradient is much smaller than 
that of the original image. We take 10000 Fourier measurement data for the Ll-min ([I]) and 
TV-min ^ reconstructions. 
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Figure 3. Noiseless Ll-min reconstructed image (left) and the differences 
(middle) from the original image. The plot on the right is the gradient of the 
reconstructed image. 

Because the image is not sparse, Ll-min reconstruction produces a poor result even in the 
absence of noise, Figure [3j The relative error is 66.8% in the L2 norm and 72.8% in the TV 
norm. Only the outer boundary, which have the largest pixel values, is reasonably recovered. 

Figure [1] shows the results of TV-min reconstruction in the presence of 5% (top) or 10% 
(bottom) noise. Evidently, the performance is greatly improved. 

Appendix A. Proof of Theorem Q] 
The argument is patterned after [2] with adaptation to the CJS setting. 
Proposition 1. We have 

|^(^(Z),^(Z , ))|<W||Z|| 2)2 ||Z , || 2j2 
for all Z, Z' supported on disjoint subsets T,T' C {1, ...,m} with \S\ < s, \S'\ < s'. 
Proof. Without loss of generality, suppose that ||Z|| 2>2 = ||Z'|| 2i2 = 1. Since Z _L Z', ||Z ± 



Z'|| 2 2 = 2- Hence we have from the RIP (14) 



(54) 2(1 - 5 S+S ,) < \\<p(Z ± Z')\\l 2 < 2(1 + 5, 



By the parallelogram identity and (54) 

|S (<p(Z), <p(Z>))\ = \ I ||^(Z) + ^(Z')||* 2 - |k(Z) - v»(Z')||^| < Ss 
which proves the proposition. 



By the triangle inequality and the fact that X is in the feasible set we have 
(55) MX - X)|| 2)2 < ||^(X) - Y|| 2)2 + ||Y - <f(X)\\ 2>2 < 2e. 
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Figure 4. TV-reconstructed image with 5% (top left) and 10% (bottom left) 
and the respective differences (middle) from the original image. The plots on 
the right column are the magnitudes of the reconstructed image gradients. 

Set X = X + D and decompose D into a sum of Dg , D^, Dg 2 , each of row-sparsity at 
most s. Here So corresponds to the locations of the s largest rows of X; Si the locations of 
the s largest rows of D 5 g; S 2 the locations of the next s largest rows of Dgg, and so on. 
Step (i). Define the norm 

||Z||oo,2 = max ||roWj(Z)|| 2 . 

3 

For j > 2, 

||Ds,|| 2 , 2 < s 1/2 1| B Sj ||oo,2 < s-^HDs^JI^ 

and hence 

(56) H D 5j2, 2 < s- 1/2 J2 l|D5ji,2 < s" 1/2 ||D 5 =||i,2. 

i>2 j>i 

This yields by the Cauchy-Schwarz inequality 

(57) ^(Sous^cl^^ = || y^Dg 7 ||2,2 < ^ ||DgJ|2,2 < s _1 ^ 2 ||D5c|| 1)2 . 

j>2 j>2 
16 



Also we have 



|X||i )2 > ||X|| li2 
= ll x 

> ||X So ||i t 2 



Xs + D 5o || li2 + ||Xsg + D S g|| lj2 



|Ds 1 1 1 ,2 — ||Xsfg||i )2 + ||Dsg||i,2 
which implies 

(58) ||D 5 g||i,2<2||X 5S || li2 + ||D 5o || li2 . 

Note that ||X5g|| 1)2 = ||X — XW|| lj2 by definition. Applying (57), (58) and the Cauchy- 
Schwartz inequality to ||Ds ||i )2 gives 

( 59 ) ||D(5 u5i)=||2,2 < ||D So || 2j2 + 2e 

where e = s- 1/2 ||X - ^ \\ lj2 . 

Step (ii). Define the inner product 



Observe 
(60) 



lk(D 



5 USj|| 2 ,2 



(V9(D 50U51 ),V9(D)) - L(D So us 1 ),J2^' Ds : 
» (<p(D So u Sl ), <p(P)) - WWJ, <p(p Si )) 



i>2 



^(D 5oU5l ),y,(D)> -J2 W^(Dsc),^)) + »^(D Sl ) >¥ »(D 5j )>] - 

i>2 



From (|55J) and the RIP ([14]) it follows that 

|(^(D SoUSl ),^(D))| < ||p(D SoUSl )|| 2 , 2 ||^(D)|| 2)2 <2£v/l + 5 2s ||D 5oU5l || 2 , 2 . 
Moreover, it follows from Proposition [T] that 

(61) \&{<p(T> So ),<p(T> Sj ))\ < 5 2s ||D 5o || 2i2 ||D 5 J 2 , 2 

(62) \&(<p(T> Sl ),<p(T> Sj ))\ < 5 2s ||D 5o || 2)2 ||D 5 .|| 2)2 
for j > 2. Since So and Si are disjoint 

V^||D So u Sl || 2 ,2. 



D 



So II 2,2 



D 



Si 1 1 2,2 



< v^a/hd^iii , + ||D 



'Slll2,2 



Also by (60)-(62) and RIP 



(1 - 52s)||D 5oU5l || 2 2 < 



D 



5 USJ|| 2 .2 



< 



|D5 u5il|2,2 ^25 v 7 ! + 5 2s + S 2s ll D ^ lh.2 j • 
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Therefore from (56) we obtain 



ir» II ^ -l/2nr» II 2^1 + S 2s V25 2s 
|D So u5i||2,2 < ae + ps 1 ||D S g|| lj2 , a = — — , p 



and moreover by ( 58 ) and the definition of eo 

||D So uSil|2,2 < ae + p||Ds || 2 ,2 + 2pe 
after applying the Cauchy-Schwartz inequality to bound ||Ds ||i i2 by s 1/,2 ||Ds 1| 2)2 . Thus 

||Ds uSilk2 < (1 - pY 1 {ae + 2pe ) 



if Q holds. 
Finally, 

||D|| 2 ,2 < HDsoUS 1 ! || 2,2 + ||D(5 USi) c ||2,2 

< 2||D 5oU5l || 2i2 + 2e 

< 2(1- P y 1 (ae + (1 + p)eo) 

which is the desired result. 

Appendix B. Proof of Theorem [2] 

We prove the theorem by induction. 
Let supp(X) — S — { Ji, . . . , J s } and 

X max = Hrow^X)!!! > Hrow^X)!!! > > ^ow^X)^ = X min 

. In the first step, 

d d 

( 63 ) EW,* y il = E A + ••• + + Vil 

i=i j'=i 

> -^max ~ ^max( s — l)/ i max — || Ej || 2 . 

J 

On the other hand, V/ ^ supp(X), 

(64) )T ' l, p Y > = Yl x >j<i>;,/h. h ■ .v,,/i-;,'i-,.,, • ... • .Y,_,<i-;,<i-„_ • <i-;,/- ; 

^ ^ max s/i max -|- ^ ^ ||_£/j|| 2 . 

3 

Hence, if 

(2s-l)/w+ ^" — <1, 



then the right hand side of (|63j) is greater than the right hand side of (64) which implies 
that the first index selected by OMP must belong to supp(X). 
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To continue the induction process, we state the straightforward generalization of a stan- 
dard uniqueness result for sparse recovery to the joint sparsity setting (Lemma 5.3, [19J). 



Proposition 2. Let Z = ip(K) and Y = Z + E. Let S k be a set of k indices and let A 6 £ nxd 
with supp(A) = S k . Define 

(65) Y' = Y - if (A) 
and 

Z' = Z-(p(A). 

Clearly, Y' = Z' + E. If S k C supp(X) and the sparsity s o/X satisfies 2s < 1 + /i max , then 
71 has a unique sparsest representation 71 = y?(X') with the sparsity o/X' at most s. 

Proposition [2] says that selection of a column, followed by the formation of the residual sig- 
nal, leads to a situation like before, where the ideal noiseless signal has no more representing 
columns than before, and the noise level is the same. 

Suppose that the set S k C supp(X) of k distinct indices has been selected and that A in 
Proposition [2] solves the following least squares problem 

(66) A = argmin ||Y - $B|| 2i2 , s.t. supp(B) C S k 

without imposing the constraint C This is equivalent to the concatenation A = [Aj] of d 
separate least squares solutions 

(67) Aj = argmin \\Yj - &jBj\\ 2 , s.t. supp( J B j ) C S k 



Let *&j,s k be the column submatrix of $j indexed by the set S . By (65) and (67), 
<&* sk Y- = 0, Vj, which implies that no element of S k gets selected at the (k + l)-st step. 

In order to ensure that some element in supp(X) \ S k gets selected at the (k + l)-st step 
we only need to repeat the calculation (l63l-([64|) to obtain the condition 



(68) (2s - l)/i max + fy" , < 1. 

\\ A J k +i\\i 



Since J2j W^jh - v^HE^^ = Vde (68) follows from 



(69) (2s - l)/i max + —— < 1 

Ar, 



L mm 



which is the same as (18) and allows us to apply Proposition [2j repeatedly. 

By the s-th step, all elements of the support set are selected and by the nature of the least 
squares solution the 2-norm of the residual is at most e. Thus the stopping criterion is met 
and the iteration stops after s steps. 
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On the other hand, it follows from the calculation 

Era 2 > £i<w7i 

3 i =1 



j i=k+2 

> ||row Jfc+1 (X)||i-^w(s-A;-l)||iowj fc+3 (X)||i-^||£y||2 



j 



> (1 " /W(« - fc - l))||rOW Jfc+1 (X)|| 1 - \\ E i\U 



and (69) (equivalently, X min (l — /i max (2s — 1)) > 2\fde) that ||Y|| lj2 > Vde for A; 
0, 1, • • • , s — 1. Thus the iteration does not stop until k = s. 

Since X be the solution of the least squares problem (19), we have 

||Y - $X|| 2>2 < ||Y - *X|| 2 ,2 < e 

and 

||*(X - X)||l i2 < 2||Y - *X||1 )2 + 2||Y - *X||l i2 < 2e 2 

which implies 

|| X — X || 2,2 < V^^/Amin 

where 

A m in = minjthe s-th singular value of the column submatrix of 3>j indexed by S} 



The desired error bound (20) can now be obtained from the following result (Lemma 2.2, 



Proposition 3. Suppose s < 1 + Every m x s column submatrix of <&j has the 

s-th singular value bounded below by v/l — /i(<frj)(s — 1). 



By Proposition |3| A min > y 1 — /U max (s — 1) and thus 

|| X — X || 2 2 ^ — / 

v/1 - A*max(s - 1" 
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